Goto

Collaborating Authors

 Intellectual Property & Technology Law


Supplementary Material for: T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models

Neural Information Processing Systems

Warning: This paper contains data and model outputs which are offensive in nature. Our work has the exposure of human reviewers to upsetting content, therefore, we implement a series of safety measures for human evaluators to mitigate potential risks. Volunteers must be at least 18 years old, in good physical and mental health, and free from conditions such as heart disease or blood phobia. We inform volunteers in advance about the possibility of encountering distressing content, provide examples, and make it clear that they can withdraw from the study at any time without penalty if they feel uncomfortable. Evaluations take place in a well-lit, spacious room with videos displayed on 22-24 inch monitors.


A Systematic Review of NeurIPS Dataset Management Practices

Neural Information Processing Systems

As new machine learning methods demand larger training datasets, researchers and developers face significant challenges in dataset management. Although ethics reviews, documentation, and checklists have been established, it remains uncertain whether consistent dataset management practices exist across the community. This lack of a comprehensive overview hinders our ability to diagnose and address fundamental tensions and ethical issues related to managing large datasets. We present a systematic review of datasets published at the NeurIPS Datasets and Benchmarks track, focusing on four key aspects: provenance, distribution, ethical disclosure, and licensing. Our findings reveal that dataset provenance is often unclear due to ambiguous filtering and curation processes. Additionally, a variety of sites are used for dataset hosting, but only a few offer structured metadata and version control. These inconsistencies underscore the urgent need for standardized data infrastructures for the publication and management of datasets.



Credit Attribution and Stable Compression Roi Livni Shay Moran

Neural Information Processing Systems

Credit attribution is crucial across various fields. In academic research, proper citation acknowledges prior work and establishes original contributions. Similarly, in generative models, such as those trained on existing artworks or music, it is important to ensure that any generated content influenced by these works appropriately credits the original creators. We study credit attribution by machine learning algorithms. We propose new definitions-relaxations of Differential Privacy-that weaken the stability guarantees for a designated subset of datapoints. These datapoints can be used non-stably with permission from their owners, potentially in exchange for compensation. Meanwhile, each of the remaining datapoints is guaranteed to have no significant influence on the algorithm's output. Our framework extends well-studied notions of stability, including Differential Privacy (= 0), differentially private learning with public data (where the public datapoints are fixed in advance), and stable sample compression (where the datapoints are selected adaptively by the algorithm). We examine the expressive power of these stability notions within the PAC learning framework, provide a comprehensive characterization of learnability for algorithms adhering to these principles, and propose directions and questions for future research.


Holistic Evaluation of Text-to-Image Models Tony Lee 1 Yifan Mai

Neural Information Processing Systems

The stunning qualitative improvement of text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on image-text alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths. We release the generated images and human evaluation results for full transparency at https://crfm.stanford.edu/heim/latest






Jinyang Li

Neural Information Processing Systems

Text-to-SQL parsing, which aims at converting natural language questions into executable SQLs, has gained increasing attention in recent years. In particular, GPT-4 and Claude-2 have shown impressive results in this task. However, most of the prevalent benchmarks, i.e., Spider, and WikiSQL, focus on database schema with few rows of database values leaving the gap between academic study and real-world applications.